fetch_indices_ohlcv.py - ChartsMaze EDL Pipeline

Overview

fetch_indices_ohlcv.py is a Phase 2.5 data acquisition script that fetches and maintains historical OHLCV (Open, High, Low, Close, Volume) data for major NSE indices. It uses a hybrid incremental approach that merges deep historical data with today’s live snapshot.

Pipeline Position: Phase 2.5 - Runs after stock OHLCV fetch, before market breadth processingCritical Function: Provides index price data required for historical market breadth calculations and dashboard charting

Purpose

This script:

Fetches deep historical OHLCV data from Dhan’s ScanX API
Incrementally updates existing CSV files (only fetches missing date ranges)
Merges today’s live snapshot from all_indices_list.json
Outputs individual CSV files for each index

Input Files

all_indices_list.json

JSON

required

Index list with live market data including today’s OHLCV snapshotRequired fields per index:

Symbol: Index symbol (e.g., “NIFTY”, “NIFTY MIDCAP 150”)
Exchange: Exchange code (e.g., “NSE”)
Segment: Segment code
Instrument: Instrument type
IndexID: Unique index identifier
Ltp: Current close price (used as today’s Close)
Open, High, Low: Today’s OHLC values
Volume: Today’s volume

indices_ohlcv_data/*.csv

CSV

Existing index OHLCV files (if present, only missing dates are fetched)

Output Files

indices_ohlcv_data/*.csv

CSV

Individual CSV files for each index with standardized namingFilename pattern: {SAFE_SYMBOL}.csv

Spaces and special characters replaced with underscores
Example: NIFTY_MIDCAP_150.csv, NIFTY.csv

CSV Structure:

Date,Open,High,Low,Close,Volume
2023-01-01,18000.00,18050.25,17980.50,18030.75,5000000
2023-01-02,18030.75,18100.00,18010.00,18085.50,5200000

Processing Logic

1. Incremental Sync Detection

Checks existing CSV files and determines date ranges to fetch:

existing_data_cache = {}
for idx in indices:
    sym = idx["Symbol"]
    safe_sym = get_safe_sym(sym)
    output_path = os.path.join(OUTPUT_DIR, f"{safe_sym}.csv")
    
    target_start = global_start_ts
    if os.path.exists(output_path):
        try:
            with open(output_path, "r") as f:
                rows = list(csv.DictReader(f))
                if rows:
                    existing_data_cache[safe_sym] = rows
                    last_row_date = rows[-1]["Date"]
                    # Schedule chunk for recent days to ensure gap-filling
                    last_dt = datetime.strptime(last_row_date, "%Y-%m-%d")
                    target_start = int(last_dt.timestamp()) + 86400
        except: pass

2. Chunked Historical Fetch

Fetches data in 120-day chunks using multithreading:

CHUNK_DAYS = 120
MAX_THREADS = 60
global_start_ts = 215634600  # 1976

# Only crawl if there's a gap before today
if target_start < global_end_ts - 86400:
    current_end = global_end_ts
    while current_end > target_start:
        c_start = max(target_start, current_end - (CHUNK_DAYS * 86400))
        tasks.append({
            "EXCH": idx["Exchange"], "SYM": sym, "SEG": idx["Segment"],
            "INST": idx["Instrument"], "SEC_ID": idx["IndexID"],
            "EXPCODE": 0, "INTERVAL": "D", "START": c_start, "END": current_end,
            "SAFE_SYM": safe_sym
        })
        current_end = c_start - 86400

3. Parallel API Execution

Uses ThreadPoolExecutor for high-speed fetching:

with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
    future_to_payload = {executor.submit(fetch_chunk, t): t for t in tasks}
    for future in as_completed(future_to_payload):
        payload = future_to_payload[future]
        rows = future.result()
        if rows:
            new_data[payload["SAFE_SYM"]].extend(rows)

4. API Data Extraction

def fetch_chunk(payload):
    try:
        r = requests.post("https://openweb-ticks.dhan.co/getDataH", 
                         json=payload, headers=get_headers(), timeout=10)
        if r.status_code == 200:
            data = r.json().get("data", {})
            times = data.get("Time", [])
            if not times: return []
            
            rows = []
            o, h, l, c, v = data.get("o", []), data.get("h", []), \
                           data.get("l", []), data.get("c", []), \
                           data.get("v", [])
            for i in range(len(times)):
                t = times[i]
                dt_str = t if isinstance(t, str) else datetime.fromtimestamp(t).strftime("%Y-%m-%d")
                rows.append({
                    'Date': dt_str, 
                    'Open': o[i], 
                    'High': h[i], 
                    'Low': l[i], 
                    'Close': c[i], 
                    'Volume': v[i]
                })
            return rows
    except:
        pass
    return []

5. Live Data Merge

Merges today’s snapshot with historical data:

today_str = datetime.now().strftime("%Y-%m-%d")

for idx in indices:
    safe_sym = get_safe_sym(idx["Symbol"])
    
    # 1. Start with existing or historic data
    base_rows = existing_data_cache.get(safe_sym, [])
    fetched_rows = new_data.get(safe_sym, [])
    all_rows = base_rows + fetched_rows
    
    # 2. Add TODAY'S snapshot from all_indices_list.json
    today_row = {
        'Date': today_str, 
        'Open': idx.get('Open'), 
        'High': idx.get('High'), 
        'Low': idx.get('Low'), 
        'Close': idx.get('Ltp'),  # Ltp is Close for the running day
        'Volume': idx.get('Volume', 0)
    }
    
    # Deduplicate and update
    merged = {r['Date']: r for r in all_rows}
    merged[today_str] = today_row
    
    final_rows = sorted(merged.values(), key=lambda x: x['Date'])

6. CSV Writing

output_path = os.path.join(OUTPUT_DIR, f"{safe_sym}.csv")
with open(output_path, "w", newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['Date', 'Open', 'High', 'Low', 'Close', 'Volume'])
    writer.writeheader()
    writer.writerows(final_rows)

Configuration

CHUNK_DAYS

integer

default:"120"

Number of days per API request chunk (optimized for rate limiting)

MAX_THREADS

integer

default:"60"

Maximum concurrent API requests (increased for ultra-fast performance)

OUTPUT_DIR

string

default:"indices_ohlcv_data"

Directory where index CSV files are saved

Supported Indices

The script processes all indices in all_indices_list.json, commonly including:

NIFTY (Nifty 50)
NIFTY 500
NIFTY MIDCAP 150
NIFTY SMALLCAP 250
NIFTY MIDSMALLCAP 400
Sector indices (Bank, IT, Pharma, etc.)
Thematic indices

Usage Example

python fetch_indices_ohlcv.py

Expected Output:

Checking 147 indices for sync...
Executing 423 API chunks for history...
Merging with Live Snapshots and saving CSVs...
Successfully updated all index CSVs with Today's Live data.

Performance Optimization

Incremental Updates: The script only fetches date ranges that are missing from existing CSV files, making subsequent runs extremely fast.

Multithreading: With 60 concurrent threads, the script can fetch decades of data for 100+ indices in under 2 minutes.

Error Handling

API Timeout: Individual chunk failures are silently skipped (returns empty array)
Malformed CSV: If existing CSV is corrupted, script re-fetches full history
Missing today’s data: If Ltp is missing from input JSON, uses 0 as fallback
Date parsing errors: Handles both string dates and Unix timestamps from API

Data Quality

Today’s Data Source: The “Close” price for today comes from the Ltp field in all_indices_list.json, which updates in real-time during market hours. After market close, this represents the final close price.

Symbol Sanitization

def get_safe_sym(sym):
    return "".join([c if c.isalnum() else "_" for c in sym])

Converts symbols to filesystem-safe names:

"NIFTY MIDCAP 150" → "NIFTY_MIDCAP_150"
"NIFTY 50" → "NIFTY_50"

API Endpoint

URL: https://openweb-ticks.dhan.co/getDataH Method: POST Payload Example:

{
  "EXCH": "NSE",
  "SYM": "NIFTY",
  "SEG": "IDX",
  "INST": "INDEX",
  "SEC_ID": 13,
  "EXPCODE": 0,
  "INTERVAL": "D",
  "START": 1640995200,
  "END": 1672531200
}

Response Example:

{
  "data": {
    "Time": [1640995200, 1641081600, ...],
    "o": [18000.0, 18050.0, ...],
    "h": [18100.0, 18150.0, ...],
    "l": [17950.0, 18000.0, ...],
    "c": [18030.0, 18080.0, ...],
    "v": [5000000, 5200000, ...]
  }
}

process_historical_market_breadth.py - Consumes index OHLCV data for breadth charts
fetch_all_indices.py - Creates all_indices_list.json with live snapshots

​Overview

​Purpose

​Input Files

​Output Files

​Processing Logic

​1. Incremental Sync Detection

​2. Chunked Historical Fetch

​3. Parallel API Execution

​4. API Data Extraction

​5. Live Data Merge

​6. CSV Writing

​Configuration

​Supported Indices

​Usage Example

​Performance Optimization

​Error Handling

​Data Quality

​Symbol Sanitization

​API Endpoint

​Related Scripts

Overview

Purpose

Input Files

Output Files

Processing Logic

1. Incremental Sync Detection

2. Chunked Historical Fetch

3. Parallel API Execution

4. API Data Extraction

5. Live Data Merge

6. CSV Writing

Configuration

Supported Indices

Usage Example

Performance Optimization

Error Handling

Data Quality

Symbol Sanitization

API Endpoint

Related Scripts